Empirical Methods for MT Lexicon Development
نویسنده
چکیده
This article reviews some recently invented methods for au tomatically extracting translation lexicons from parallel texts The ac curacy of these methods has been signi cantly improved by exploiting known properties of parallel texts and of particular language pairs The state of the art has advanced to the point where translations can be found automatically and with high reliability even for non compositional com pound phrases that are not translated word for word Crucially all of these methods can be smoothly integrated into the usual work ow of MT system developers Partial automation of MT lexicon construction is likely to produce more accurate results more e ciently
منابع مشابه
Reuse of linguistic resources in MT
Machine translation (MT) draws more heavily on lexical resources than most other NLP applications. First, grammars of both source and target languages require lexicons. Second, some sort of mapping between lexicons is required in order to transfer information from a source to a target language. The MT system described here is based on Shake-and-Bake technology and uses lexical transfer as the i...
متن کاملAn Empirical Architecture for Verb Subcategorization Frame - a Lexicon for a Real-world Scale Japanese-English Interlingual MT
The verb subcategorization frame information plays a major role of disambiguations in many NLP applications. Japanese, however, imposes difficulties of subcategorizing in part because it allows arbitrary ellipses of case elements. We propose a new type of verb subcategorization frame code set that combines the verb's surface case set and the deep case set, as a solution to the difficulties of e...
متن کاملAn Empirical Architecture for Verb Subcategorization Frame - a Lexicon for a Real-world Scale Japanese-English Interlingual MT
The verb subcategorization frame information plays a major role of disambiguations in many NLP applications. Japanese, however, imposes difficulties of subcategorizing in part because it allows arbitrary ellipses of case elements. We propose a new type of verb subcategorization frame code set that combines the verb's surface case set and the deep case set, as a solution to the difficulties of e...
متن کاملBenchmarking Machine Translated Sentiment Analysis for Arabic Tweets
Traditional approaches to Sentiment Analysis (SA) rely on large annotated data sets or wide-coverage sentiment lexica, and as such often perform poorly on under-resourced languages. This paper presents empirical evidence of an efficient SA approach using freely available machine translation (MT) systems to translate Arabic tweets to English, which we then label for sentiment using a state-of-th...
متن کاملFUDR-based MT, head switching and the lexicon
We present an MT-approach which does transfer at the level of flat underspecified discourse representation structures. It allows for natural definitions of notoriously difficult structural divergencies between source and target, like head switching, by exploiting the formal means of semantic scope. The corresponding expressive lexicon formalism allows for a lexically driven, co-descriptive tran...
متن کامل